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SOFTWARE SET-VALUE PROFILING AND CODE REUSE 



Eidd 

5 The present invention relates generally to software, and more specifically to 

software capable of reusing regions of code. 



Background of the Invention 

Modem software programs include many instructions that are executed 

1 0 multiple times each time the program is executed. Typically, large programs have 
logical "regions" of instructions, each of which may be executed many times. When 
a region is one that is executed more than once, and the results produced by the 
region are the same for more than one execution, the region is a candidate for 
"reuse." The term "reuse" refers to the reusing of results from a previous execution 

15 of the region. 

For example, a computation reuse region could be a region of software 
instructions that, when executed, read a first set of registers and modify a second set 
of registers. The data values in the first set of registers are the "inputs" to the 
computation reuse region, and the data values deposited into the second set of 

20 registers are the "results" of the computation reuse region. A buffer holding inputs 
and results can be maintained for the region. Each entry in the buffer is termed an 
"instance." When the region is encountered during execution of the program, the 
buffer is consulted and if an instance with matching input values is found, the results 
can be used without having to execute the software instructions in the computation 

25 reuse region. When reusing the results is faster than executing the software 
instructions in the region, performance improves. Such a buffer is described in: 
Daniel Connors & Wen-mei Hwu, "Compiler-Directed Dynamic Computation 
Reuse: Rationale and Initial Results," Proceedings of the 32nd Annual Intemational 
Symposium on Microarchitecture (MICRO), November 1999. 

30 Some regions make better candidates for reuse than others. For example, a 

region capable of producing an often-reused instance is a good candidate for reuse. 



In contrast, regions that produce instances that are not reused often generally do not 
make good candidates for reuse, in part because new instances are frequently 
generated, and buffered instances are not often reused. Regions that are candidates 
for reuse are typically identified when the program is compiled. The compiler 
5 identifies candidates for reuse, and selects which candidates are to be computation 
reuse regions after the program is compiled. This can be a difficult problem, in part 
because the compiler does not necessarily have information describing whether 
candidate regions have the quaUties that make for good reuse regions. 

Some compilers use value profiling algorithms in an attempt to identify 

10 variables with invariant behavior. One such value profiling algorithm is discussed 
in: Brad Calder, Peter Feller & Alan Eustace, "Value Profiling," Proceedings of the 
30th Annual International Symposium on Microarchitecture (MICRO), December 
1997. Calder et al. present a technique that attempts to identify variables with 
invariant behavior by observing each variable accessed by instructions. Calder et al. 

1 5 also present a technique that observes each variable for a period of time and then 
tests for convergence. This approach can incur significant overhead, in part because 
every value generated by every instruction is profiled. Value profiling as described 
by Calder et al. is not directly appUcable to the identification of reuse regions, in part 
because regions often have inputs and outputs that include multiple variables. 

20 For the reasons stated above, and for other reasons stated below which will 

become apparent to those skilled in the art upon reading and understanding the 
present specification, there is a need in the art for an altemate method and apparatus 
for identifying and profiling candidate reuse regions. 

25 Brief Description of the Drawing s 

Figure 1 shows a candidate reuse region; 

Figure 2 shows input values and set-values for the candidate reuse region of 
Figure 1; 

Figure 3 A shows a set- value in accordance with one embodiment of the 
30 invention; 
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Figure 3B shows a set-value in accordance with another embodiment of the 
invention; 

Figure 4 shows a profiUng data structure; 

Figure 5A shows a sampling value profiler; 
5 Figure 5B shows instrumenting code that implements the sampling value 

profiler of Figure 5 A; 

Figure 6A shows software instructions that access an array; 

Figure 6B shows a location- value in accordance with one embodiment of the 
present invention; 

10 Figure 6C shows a location-value in accordance with another embodiment of 

the present invention; 

Figure 6D shows a location-value profiling data structure; 

Figure 7 shows a flowchart for a method of selecting reuse regions; and 

Figure 8 shows a processing system. 

15 

Description of Rmhodiments 

In the following detailed description of the embodiments, reference is made 
to the accompanying drawings that show, by way of illustration, specific 
embodiments in which the invention may be practiced. In the drawings, like 

20 numerals describe substantially similar components throughout the several views. 
These embodiments are described in sufficient detail to enable those skilled in the art 
to practice the invention. Other embodunents may be utiUzed and structural, logical, 
and electrical changes may be made without departing from the scope of the present 
invention. Moreover, it is to be understood that the various embodiments of the 

25 invention, although different, are not necessarily mutually exclusive. For example, a 
particular feature, structure, or characteristic described in one embodiment may be 
included within other embodiments. The following detailed description is, therefore, 
not to be taken in a limiting sense, and the scope of the present invention is defined 
only by the appended claims, along with the fixU scope of equivalents to which such 

30 claims are entitled. 
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The method and apparatus of the present invention provide a profiUng 
mechanism useful for forming computation reuse regions from a list of candidate 
reuse regions. A computation reuse region has a number of input registers. Values 
held in the input registers are input values to the region when the region is executed. 

5 When the input registers for a candidate reuse region only take on a few sets of 
different values, the candidate reuse region can be profitably reused. Each set of 
different values, hereinafter referred to as "set-values," corresponds generally to a 
reuse instance that will be generated at runtime. Candidate reuse regions exhibiting 
this behavior can be profitably reused, in part because a small number of instances 

1 0 can be reused often. 

The profiling mechanism described herein is also appUcable for selecting 
load instructions for reuse. Some instructions load values from different addresses. 
Each value loaded from each location is referred to as a "location-value." When a 
load instruction consistently accesses a small number of location-values, the load 

1 5 instruction may be profitably included within a reuse region. The profiling 
mechanism described herein can be used to profile location-values in a manner 
similar to that used for profiling set-values as described above. From the profile for 
location- values, an estimate can be generated for the likelihood that load values will 
be overwritten by stores. 

20 Figure 1 shows a candidate reuse region. A candidate reuse region is a region 

that can be made into a computation reuse region, but may or may not be a "good" 
computation reuse region. For the purposes of this description, a good computation 
reuse region is one which produces instances that are reused often. 

Candidate reuse region 100 is shown having multiple instructions, including 

25 instructions 104, 106, and 108. Instructions 104 and 108 have registers one and two 
(rl and r2) as operands. Likewise, instruction 106 has registers three and four (r3 
and r4) as operands. Input registers 102 are shown above candidate reuse region 100 
to show that registers rl, r2, and r3 are inputs to the region. 

Candidate reuse region 100 has registers 102 as input registers because the 

30 first two instructions (instructions 104 and 106) depend upon values held in the input 
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registers, and the input register values are undisturbed before being used within the 
region. Regions with small numbers of inputs and outputs are selected as candidate 
reuse regions. 

A candidate reuse region 100 can also include loop constructs. Software 
5 loops can be identified as candidate reuse regions when they have small numbers of 
input and output registers. For ease of explanation, candidate reuse region 100 is not 
shown as a loop, but rather is shown as a linear sequence of instructions. Input 
registers 102 are the "input set" to candidate reuse region 100. 

Figure 2 shows individual input values and set- values for the candidate reuse 

10 region of Figure L Table 200 shows top- values for input register rl in column 202, 
top-values for input register r2 in column 204, and top-values for input register r3 in 
column 206, The term "top-values" as used herein refers to values that occur most 
frequently. For example, in table 200, value vl 1 occurs most frequently in input 
register rl at the beginning of candidate reuse region 100. Likewise, value v21 

1 5 occurs most frequently in input register r2, and value v3 1 occurs most frequently in 
input register r3. Each of columns 202, 204, and 206 show individual top-values for 
each of input registers rl, r2, and r3 at the beginning of candidate reuse region 100. 

Column 208 shows top set- values for the input set. In this example, the input 
set consists of input registers 102 (Figure 1). As can be seen in column 208, the top 

20 set-value corresponds to the combination of individual values for each input register. 
For example, the top set- value corresponds to a value of vl 1 in input register rl, a 
value of v21 in input register r2, and a value of v31 in input register r3. As can also 
be seen in column 208, the next most frequently occurring top set-value includes a 
mix of individual top- values. For example, the next most frequently occurring top 

25 set-value corresponds to a value of vl 1 in input register rl , a value of v22 in input 
register r2, and a value of v33 in input register r3. 

As previously described, if a small number of top set-values account for a 
majority of the set- values at the beginning of a candidate reuse region, the candidate 
reuse region is a good choice for forming a computation reuse region. Individual 

30 top-values, such as those shown in columns 202, 204, and 206 do not map directly to 
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top set- values as shown in column 208. The method and apparatus of the present 
invention directly profiles top set-values such as those shown in column 208. As a 
result, candidate reuse regions, such as candidate reuse region 100 (Figure 1) can be 
selected as computation reuse regions when profitable. 
5 Figure 3 A shows a set-value in accordance with one embodiment of the 

invention. Set- value 300 includes three values that correspond to values in three 
input registers to a candidate reuse region. VI 302, v2 304, and v3 306 are 
concatenated to produce set-value 300. Set-value 300, as shown in Figure 3A, does 
not include register names because register names can be inferred from the relative 

10 placement of values 302, 304, and 306. 

The size of set-value 300 is equal to the sum of the sizes of values 302, 304, 
and 306. As the number of input registers increases, the size of set- value 300 also 
increases. As the size of set-value 300 increases, the storage requirements for 
profiling a large number of candidate reuse regions can become large. 

1 5 Figure 3B shows a set-value in accordance with another embodiment of the 

invention. Embodiment 350 shows set-value 370 generated as a fionction of value vl 
352, value v2 354, and value v3 356. In embodiment 350, value 352 is shown 
having three segments. Each segment represents a portion of the total value, such as 
a single byte in a three byte word. Segments of value 352 are combined, or "folded," 

20 using exclusive-or operator 358. Likewise, segments of value v2 are folded using 
exclusive-or operator 360, and segments of value 356 are folded using exclusive-or 
operator 362. The output of exclusive-or operators 358, 360, and 362 are 
concatenated to produce set-value 370. Set-value 370 represents the combination of 
the values of the input set of a candidate reuse region. Unlike set-value 300 (Figure 

25 3A), set-value 370 does not necessarily grow in size as the number of values 

increases. For example, if the number of values increases beyond three, additional 
exclusive-or operators can be employed to combine the additional values prior to 
concatenation into set-value 370. 

Set-value 370 may not be unique for each possible combination of values 

30 352, 354, and 356. For example, two different combinations of values may produce 
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the same set-value 370. This can decrease the accuracy of the resulting profile 
generated; however, the degraded accuracy is traded for increased storage efficiency. 
In practice, most profiled values are small, and the likelihood that two profiled input 
sets result in the same set-value 370 is small. For example, if each of values 352, 
5 354, and 356 only have non-zero values in the left-most segment, then no data is lost 
as a result of the exclusive-or folding operations, and each set-value will be unique. 

Set- value 370 is shown in Figure 3B as being generated fi*om values folded 
using exclusive-or operators. One skilled in the art will understand that other 
mechanisms exist for folding and combining multiple values into set-value 370. 

10 When exclusive-or operators or other mechanisms are employed, multiple values are 
combined into a single value, shown as set-value 370 in Figure 3B. 

In some embodiments, exclusive-or operators 358 and 360 are implemented 
in hardware. In some hardware implementations, registers intemal to a processor 
drive exclusive-or circuits that create set-values, such as set-value 370. In other 

15 embodiments, exclusive-or operators 358 and 360 are implemented in software. In 
some software implementations, exclusive-or operators 358 and 360 appear as 
exclusive-or machine instructions inserted into the software as instrumenting code. 

In general, N input values can be combined into a set- value that is less than N 
input words long. The combining techniques shown in Figures 3 A and 3B can be 

20 utilized together or with other combining mechanisms while still practicing the 

present invention. For example, a subset of the total number of values can be folded 
using exclusive-or operators resulting in multiple subset-values, which can then be 
concatenated as shown in Figure 3 A to form set-values. Once a set-value is created, 
any suitable value profiling technique can be used to produce profihng information 

25 such as that shown and described with reference to Figure 4 below. 

Figure 4 shows a profiling data structure for a candidate region. Profiling 
data structure 400 includes top set- values 410 and profile indicators 420 arranged in 
records, and the total number of set-values 430. Record 422 has a set- value shown as 
"A," and has a profile indicator value of 800. Likewise, record 424 has a set- value 

30 shown as "B," and has a profile indicator value of 400. Set-values 410 are shown in 
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Figure 4 having alphanumeric values for ease of explanation. In some embodiments, 
set-values 410 have values that include concatenated register values, such as set- 
value 300 (Figure 3 A). In other embodiments, set-values 410 have values 
corresponding to combined register values, such as set- value 370 (Figure 3B), The 
5 total number of set- values 430 is the sum of all the set- values encountered by the 
profiler at the region entry, including the top set- values and the less fi-equently 
encountered set-values. 

In the embodiment of Figure 4, profihng data structure 400 is shown in a 
state existing after a region has been profiled. Top set-values 410 have been profiled, 

10 and profile indicators 420 show how often each of top set-values 410 was 

encountered. For example, as shown in record 422, top set-value A occurred 800 
times. Likewise, as shown in record 424, set- value B occurred 400 times. The total 
number of set-values 430 is equal to 3000. Five hundred of the 3000 sampled set- 
values did not match set-values in profiling data structure 400 and were discarded. 

1 5 During profiling, when a particular set- value is encountered, profiling data 

structure 400 is accessed as a fimction of set-values and the corresponding profile 
indicator is updated. In this example, the profiling indicator is updated using an 
increment operation. Profiling data structure 400 only keeps a small number of 
distinct set-values. For example, profiling data structure 400 may include only eight 

20 entries. 

The relative probability of occurrence of each top set-value 410 is a fimction 
of the total number of set-values 430 collected firom the region during profiling. For 
example, the sum of all profile indicators 420 maintained in data structure 400 is 
equal to 2500. If the input set of the candidate reuse region were sampled a total of 
25 3000 times resulting in profiling data structure 400, the candidate reuse region may 
be a good candidate for a computation reuse region. The candidate reuse region may 
be a good computation reuse region in part because the top eight set- values as shown 
in data structure 400 account for greater than 80 percent (2500/3000 > .8) of all set- 
values sampled for the candidate reuse region. 
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If, however, profiling data structure 400 results after sampling the input 
register set a total of 20000 times, the candidate reuse region may not be a good 
choice for a computation reuse region. The candidate reuse region may not be a 
good computation reuse region in part because the top eight values as shown in data 
5 structure 400 account for less than 13 percent (2500/20000 < . 13) of all set- values 
sampled for the candidate reuse region. 

In some embodiments, the number of top set-values to profile is a decision 
made prior to profiling the software. The size of profiling data structure 400 is then 
set accordingly. If a processor that will ultimately execute the computation reuse 
10 regions in the end-user environment has the capability to store a large number of 
computation reuse instances, then the number of top set-values profiled can also be 
large. In some embodiments, the size of profiling data structure 400 is at least as 
large as the number of expected reuse instances that will be stored in the end-user 
environment. 

15 Profiling data structure 400 can be implemented in any suitable type of 

physical data structure. In some embodiments, data structure 400 is an array 
sequentially searched by the set- value. In other embodiments, data structure 400 is 
implemented in a hash table. In still other embodiments, data structure 400 is a 
dedicated hardware buffer resident within the processor that performs the profiling 

20 operations. 

Figure 5 A shows a sampling value profiler. As previously described, a good 
computation reuse region can be selected based on the fi-equency of occxirrence of top 
set-values. The firequency of occurrence of top set-values can be ascertained by 
statistically sampling a sufficient number of set-values without sampling every single 
25 one. As shown in embodiment 500, value profiler 506 receives one of every "S" set- 
values firom filter 504. Filter 504 receives a set- value 502 each time a candidate 
reuse region is encountered during profiling, but only passes one of every S set- 
values to value profiler 506. By sampling every S values, an approximation of the 
probability of occurrence of top set-values is generated. Figure 5 A shows the 
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sampling mechanism in schematic fomi. Figure 5B shows an embodiment of a 
sampling profiler using pseudo-code. 

Figure 5B shows instrumenting code that implements the sampling value 
profiler of Figure 5 A. Instrumenting code 520 shows four instructions. In some 

5 embodiments, instrumenting code is inserted in a program being profiled at the 
beginning of a candidate reuse region. Instruction 522 sets one of two predicate 
registers to "true" and the other to "false" based on the outcome of a "compare" 
operation. The two predicate registers include a "true" predicate register shown as 
"pt," and a "false" predicate register shown as "pf." When, in instruction 522, the 

10 variable labeled "counter" is equal to zero, the true predicate register is set, and 
instructions 526 and 528 executed. The execution of instruction 526 results in the 
counter variable being reinitialized to the sampling interval "S," and the execution of 
instruction 528 results in a set-value "V^" being profiled using a profiling function 
labeled "value_profile." Conversely, if the variable labeled "counter" is not equal to 

15 zero, the false predicate register is set, and instruction 524 executes. Each time 
instruction 524 executes, the counter is decremented. 

In some embodiments, much of the code shown in Figure 5B can be shared 
by multiple candidate regions if it is known that the regions will be entered under the 
same condition. In other words, if the entries of the regions are control equivalent. 

20 For example, instructions 522, 524, and 526 can be placed anywhere prior to control 
equivalent candidate region entries and candidate load instructions. Instruction 528 
can be inserted at each of the control equivalent candidate region entries and 
candidate load instructions. In these embodiments, instructions 522, 524, and 526 
are not repeated for each control equivalent candidate region entry and candidate 

25 load instruction. Profiling instructions inserted for profiling each of the control 

equivalent candidate region entries and candidate load instructions are predicated on 
the same predicate register. 

The value of "S" in Figures 5 A and 5B is chosen so that a statistically vahd 
number of samples is collected. For example, if a candidate reuse region is 

30 encountered one million times, and the top eight set-values are to be profiled, a few 



Attorney Docket 884,258usl 



10 



Client Ref. No. P8216 



hundred samples may be sufficient. In this example, S can be set on the order often 
thousand. On the other hand, if a candidate reuse region is encountered only a few 
hundred times, a statistically valid number of samples should not be too small, and S 
can be set smaller accordingly. 
5 In some embodiments, S is set such that a minimum number of samples equal 

to a multiple of the number of top set- values to be collected during profihng. One 
such embodiment is shovra in the pseudo-code that follows. 

S = user selected sampling interval 
1 0 num_samples = region_entry_freq/S 

min_num_samples = K * num_top_set_values 
if (num_samples < min_num_samples) 

S = region_entry_freq/min_num_samples 

15 In the example embodiment shown in pseudo-code above, num samples is 

the number of set-value samples to be taken, and is initially computed as the total 
number of occurrences divided by the initial sample interval, S. A minimum number 
of samples is computed as K times the number of top set-values to be profiled, and if 
the number of samples previously computed is less than the minimum, the sample 

20 interval S is recomputed to satisfy the criteria. In some embodiments, K is greater 
than or equal to ten. 

Embodiments described thus far are generally directed to set- values that can 
aid in the identification of good computation reuse regions. The method and 
apparatus of the present invention can also be utiUzed for profiling location-values 

25 that can aid in the identification of good load and store instructions for inclusion in 
reuse regions. In general, for computation reuse regions, an assumption is made that 
inputs to the computation reuse region are sourced from registers. Load and store 
instructions reference values in memory locations. This is described in more detail 
with reference to Figures 6A-6D below. 
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Figure 6A shows software instructions that access an array. Embodiment 600 
shows software instructions in an end-user program that include a load instruction. 
The load instruction occurs when the array access is made shown as "a[i]" in Figure 
6A. In instruction 602, a variable "x" is initialized to zero. Instruction 604 is the 

5 beginning of a "for" loop, and instruction 608 is the end of the "for" loop. 
Instruction 606 is executed "M" times within the "for" loop. 

Instruction 606 can be a good reuse instruction if a small number of top 
location-values account for a majority of the memory loads. For example, if the 
array "a" is invariant, each time a particular location within the array is accessed, the 

10 value retrieved will be the same. The method and apparatus of the present invention 
collects top memory locations and top load values as a set. The load location and 
loaded value is treated as a combined location-value, and the combined location- 
values are profiled to collect the top location-values for each candidate load or store 
instruction. At each candidate load instruction, a fixed number of location-values are 

15 collected. For example, in some embodiments, 20 location- values are collected for 
each candidate load instruction. 

Figure 6B shows a location- value in accordance with one embodiment of the 
present invention. Location-value 610 shows location 612 concatenated with value 
614. Value 614 is the value loaded from location 612. The combination of location 

20 612 and value 614 represent a location-value to be profiled. The concatenation of 
location 612 and value 614 is similar to the concatenation of values in set-value 300 
(Figure 3A). 

Figure 6C shows a location-value in accordance with another embodiment of 
the invention. Figure 6C shows location-value 370 generated as a fimction of 
25 location 612 and value 614. Location 612 is shown having three segments. Each 
segment represents a portion of the total data word that represents the location, such 
as a single byte in a three byte word. Segments of location 612 are folded using 
exclusive-or operator 616. Likewise, segments of value 614 are folded using 
exclusive-or operator 618. The output of exclusive-or operators 616 and 618 are 
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concatenated to produce location-value 620, Location-value 620 represents the 
combination of the location and the value accessed by a candidate load instruction. 

Location-value 620 may not be unique for each possible combination of 
locations 612 and values 614. For example, two different combinations of locations 
5 and values may produce the same location-value 620. This can decrease the 

accuracy of the resulting profile generated; however, the degraded accuracy is traded 
for increased storage efficiency. In practice, most profiled values are small, and the 
likelihood that two profiled location-values will result in the same location- value 620 
is small. For example, if each of location 612 and value 614 have non-zero values 
10 only in the left-most segment, then no data is lost as a result of the exclusive-or 
operations, and each location-value will be unique. 

Location-value 620 is shown in Figure 6C as being generated using 
exclusive-or operators. One skilled in the art will understand that other mechanisms 
exist for combining locations and values into location-value 620, When exclusive-or 
15 operators or other mechanisms are employed, locations and values are combined into 
a single value, shown as location-value 620 in Figure 6C. 

In some embodiments, exclusive-or operators 616 and 618 are implemented 
in hardware. In some hardware implementations, registers intemal to a processor 
drive exclusive-or circuits that create location-values, such as location-value 620. In 
20 other embodiments, exclusive-or operators 616 and 618 are implemented in software. 
In some software implementations, exclusive-or operators 616 and 618 appear as 
exclusive-or machine instructions inserted into the software as instrumenting code. 

Figure 6D shows a location-value profiling data structure. Profiling data 
structure 650 shows location-values 652 and profile indicators 654 arranged in 
25 records. Record 656 corresponds to location-value "A," and record 658 corresponds 
to location-value "B." As is the case with profiling data structure 400 (Figure 4), 
profile indicators 654 hold the number of times a particular location-value 652 is 
encountered, and total field 660 includes the total nimiber of times location-values 
are sampled. When a small number of top location-values represent a large 
30 percentage of the location- values for a candidate load instruction, then the candidate 
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load instruction can be good for computation reuse. In some embodiments, the 
probability of occurrence of a fixed number of top location-values is compared to a 
threshold, and if the occurrence probability is higher, the candidate load instruction is 
selected for inclusion in a computation reuse region. 
5 Figure 7 shows a flowchart for a method of selecting reuse regions. Method 

700 begins with action 710 when a candidate reuse region is identified within a 
software program. Candidate reuse regions can be identified using any of a number 
of criteria. One such candidate reuse region is shown as candidate reuse region 100 
in Figure 1. 

1 0 In action 720, the software program code is instrumented for profiling. 

Instrumenting for profiling includes inserting instructions in the program that profile 
top set- values and top location- values. In some embodiments, every time a candidate 
reuse region is encoxmtered, the instrumented code profiles a set- value for the 
candidate reuse region. In other embodiments, a samphng filter is employed, such as 

15 filter 504 (Figure 5 A), and only one of every "S" set-values is profiled. 

In action 730, the instrumented code is executed and the profile data is 
gathered. As a result, profiling data structures, such as profiling data structure 400 
(Figure 4), and profiling data structure 650 (Figure 6D) are generated. In action 740, 
the probability of occurrence of a top set-value is determined as the ratio of the 

20 number of times the top set-value was collected to the total number of times set- 
values were sampled. When a small nxmiber of top set-values represent a large 
percentage of the execution of the candidate reuse region, then the candidate reuse 
region will likely make for a good computation reuse region. 

In action 750, the candidate reuse region is used to form a computation reuse 

25 region if appropriate criteria are met. One such criteria is when the probability of 
occurrence of a small number of top set- values exceeds a threshold, A candidate 
reuse region can be used by itself or can be combined with other candidate reuse 
regions to form a computation reuse region. 

Figure 8 shows a processing system. Processing system 800 includes 

30 processor 820 and memory 830. In some embodiments, processor 820 is a processor 
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capable of executing instrumented software for profiling top set- values and top 
location-values. Processor 820 can also be a processor capable of selecting good 
computation reuse regions from candidate reuse regions. Processing system 800 can 
be a personal computer (PC), mainframe, handheld device, portable computer, set- 

5 top box, or any other system that includes software. 

In some embodiments, processor 820 includes cache memory, a memory 
controller, or a combination of the two. In these embodiments, processor 820 may 
access a profile indicator data structure without accessing memory 830. In other 
embodiments, profile indicators are maintained within memory 830, and processor 

10 820 accesses memory 830 when updating profile indicators regardless of whether 
processor 820 includes cache memory or memory controllers. 

Memory 830 can be a random access memory (RAM), read only memory 
(ROM), flash memory, hard disk, floppy disk, CDROM, or any other type of 
machine medium readable by processor 820. Memory 830 can store instructions for 

15 performing the execution of the various method embodiments of the present 
invention. 

Conclusion 

A software profiling mechanism that gathers and profiles top set-values and 
20 top location-values has been described. Software to be profiled is instrumented with 
instructions that sample set- values at the occurrence of candidate reuse regions and 
sample location-values at the occurrence of candidate load instructions. Set-values 
and location-values can be generated as concatenated values, or can be combined 
using mechanisms such as exclusive-or operators. When a small number of top set- 
25 values accoimt for a large percentage of occurrences, the candidate reuse region may 
make a good computation reuse region. Likewise, when a small number of top 
location-values account for a large percentage of occurrences of candidate load 
instructions, the candidate load instruction may make a good candidate for inclusion 
in a computation reuse region. 
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It is to be understood that the above description is intended to be illustrative, 
and not restrictive. Many other embodiments will be apparent to those of skill in the 
art upon reading and understanding the above description. The scope of the 
invention should, therefore, be determined with reference to the appended claims, 
5 along with the full scope of equivalents to which such claims are entitled. 
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What is claimed is: 

11. A computer-implemented method of profiling software comprising: 

2 identifying a candidate reuse region; 

3 determining an input set for the candidate reuse region; 

4 instrumenting the software to profile set-values for the input set; and 

5 executing the instrumented software. 

1 2. The computer-implemented method of claim 1 further comprising: 

2 identifying a candidate load instruction within the candidate reuse region; and 

3 instrumenting the software to profile location- values for the candidate load 

4 instruction. 

1 3. The computer-implemented method of claim 1 wherein the input set 

2 comprises a pluraUty of input registers, and each set- value comprises an input 

3 register value for each of the plurality of input registers, the method further 

4 comprising: 

5 for each set-value, combining each of the input register values into a single 

6 value. 

1 4. The computer-implemented method of claim 3 wherein combining 

2 comprises: 

3 folding each of the input register values to create folded values; and 

4 concatenating the folded values. 

1 5. The computer-implemented method of claim 1 wherein instrumenting 

2 comprises inserting instructions to periodically sample set-values. 

1 6. The computer-implemented method of claim 5 wherein the input-set 

2 comprises a pluraUty of input registers, and each set-value comprises an input 



Attorney Docket 884,258usl 



17 



Client Ref. No. P8216 



3 register value for each of the plurality of input registers, and wherein instrumenting 

4 further comprises: 

5 inserting instructions to combine each of the input register values into a 

6 single value; and 

7 inserting instructions to index into a data structure of profile indicators using 

8 the single value. 

17. The computer implemented method of claim 5 wherein instrumenting further 

2 comprises: 

3 inserting instructions to profile the top N occurring set-values, where N is 

4 chosen as a function of an expected number of reuse mstances. 

1 8. The computer-implemented method of claim 1 further comprising selecting 

2 the candidate reuse region as a computation reuse region. 

19. A machine readable medium including instructions for a method of profiling 

2 software, the method comprising: 

3 identifying a candidate reuse region; 

4 determining an input set for the candidate reuse region; 

5 instrumenting the software to profile set-values for the input set; and 

6 executing the instrumented software. 

1 10. The machine readable medium of claim 9 wherein instrumenting comprises: 

2 inserting instructions to periodically sample set-values. 

1 11. A computer-implemented method of profiling software comprising: 

2 periodically sampling registers to obtain register values; and 

3 storing an occurrence frequency of the register values in a data structure. 
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1 12. The computer-implemented method of claim 1 1 wherein periodically 

2 sampling comprises: 

3 sampling a pluraHty of registers to obtain a set- value every S occurrences of a 

4 candidate reuse region, where S is a sampling period. 

1 13. The computer-implemented method of claim 12 fiirther comprising: 

2 identifying a group of control equivalent candidate region entries and 

3 candidate load instructions; 

4 inserting instructions prior to the group, wherein the instructions set a 

5 predicate register every S occurrences; and 

6 inserting profiling instructions at each of the control equivalent candidate 

7 region entries and candidate load instructions, wherein the profiling instructions are 

8 predicated on the predicate register. 

1 14. The computer-implemented method of claim 12 wherein storing comprises: 

2 accessing a record in the data structure as a fimction of the set- value; and 

3 incrementing a profile indicator at the record. 

1 15. The computer-implemented method of claim 1 2 wherein periodically 

2 sampling further comprises sampling set-values in the plurality of registers at the 

3 beginning of a candidate reuse region, the plurality of registers being input registers 

4 to the candidate reuse region. 

116. A computer-implemented method of profiling software comprising: 

2 identifying a candidate load instruction; 

3 instrumenting the software to sample a location-value every S occurrences of 

4 the candidate load instruction; and 

5 executing the software. 



Attorney Docket 884.258usl 



19 



Client Ref. No, P8216 



1 17. The computer-implemented method of claim 16 wherein instrumenting 

2 comprises: 

3 inserting instructions in the software to count the number of times each 

4 location-value is sampled; and 

5 inserting instructions in the software to keep track of top location-values. 

1 1 8. The computer-implemented method of claim 16 fiirther comprising: 

2 identifying a group of control equivalent candidate region entries and 
^ 3 candidate load instructions; 

4 inserting instructions prior to the group, wherein the instructions set a 

5 predicate register every S occurrences; and 

6 inserting profiling instructions at each of the control equivalent candidate 

7 region entries and candidate load instructions, wherein the profiling instructions are 

8 predicated on the predicated register. 

1 19. The computer-implemented method of claim 17 wherein the candidate region 

2 mcludes a plurality of candidate load instructions, each of the plurality of load 

3 instructions being predicated on a common predicate register. 

1 20. The computer-implemented method of claim 1 7 wherein inserting 

2 instructions to keep track of top location-values includes inserting sampling 

3 instructions configured to profile the top N occurrences of location-values, where N 

4 is an integer. 

121. A machine readable medium including instructions for a method of profiling 

2 software, the method comprising: 

3 identifying a candidate load instruction; 

4 instrumenting the software to sample a location-value every S occurrences of 

5 the candidate load instruction; and 

6 executing the software. 
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1 22. The machine readable medium of claim 21 wherein instrumenting comprises 

2 inserting instructions in the software to count the number of times each location- 

3 value is encountered. 

1 23 . A computer-implemented method of selecting reuse regions within a software 

2 program, the method comprising: 

3 profiling top set-values for candidate reuse regions to produce a probability of 

4 top set-values; and 

5 selecting reuse regions as a fimction of the probability of top set-values. 

1 24. The computer-implemented method of claim 23 wherein profiling set-values 

2 comprises: 

3 representing each top set- value as a single value; and 

4 accessing a data structure as a fimction of the single value to modify a profile 

5 indicator. 

1 25, The computer-implemented method of claim 24 wherein accessing a data 

2 structure comprises accessing a data structure at least as large as a number of 

3 expected reuse instances. 

1 26. The computer-implemented method of claim 25 wherein selecting comprises 

2 marking as reuse regions those candidate reuse regions having a finite number of top 

3 set-values that have a probability of occurrence greater than a threshold. 

1 27. A machine readable medium including instructions for a method of selecting 

2 reuse regions within a software program, the method comprising: 

3 profiUng top set-values for candidate reuse regions to produce a probability of 

4 top set- values; and 

5 selecting reuse regions as a function of the probability of top set-values. 

Attorney Docket 884.258usl 21 Client Ref, No. PS: 



1 28. The machine readable medium of claim 27 wherein profiling set-values 

2 comprises: 

3 representing each top set-value as a single value; and 

4 accessing a data structure as a function of the single value to modify a profile 

5 indicator. 
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Abstract of the Disclosure 

An apparatus and method for profiling candidate reuse regions and candidate 
load instructions aids in the selection of computation reuse regions and computation 
reuse instructions with good reuse quahties. Registers holding input values for 
candidate reuse regions are sampled periodically when the candidate reuse region is 

5 encountered. The register contents are combined into set- values. When a relatively 
small number of set- values accoimt for a large percentage of occurrences, the 
candidate reuse region may be a good computation reuse region. Load instructions 
are profiled for the location accessed and the value loaded. The location and value 
are combined into location-values. The relative occurrence fi:-equency of location- 

10 values can be used to evaluate load instructions as candidate instructions for reuse. 
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As a below named inventor I hereby declare that: my residence, post office address and citizenship are as 
stated below next to my name; that 

I verily believe I am the original, first and sole inventor of the subject matter which is claimed and for which 
a patent is sought on the invention entitled: SOFTWARE SFT -V AT J I R PROFI T JNG AND COPE REUSE 

The specification of which is attached hereto. 

I hereby state that I have reviewed and understand the contents of the above-identified specification, 
including the claims, as amended by any amendment referred to above. 

I acknowledge the duty to disclose information which is material to the patentabihty of this appUcation in 
accordance with 37 C.F.R. § 1.56 (attached hereto). I also acknowledge my duty to disclose all information known 
to blQtnaterial to patentability which became available between a filing date of a prior application and the national or 
PCTflntemational filing date in the event this is a Continuation-In-Part appUcation in accordance with 37 C.F.R. 

§l-f]te)- 

!1 I hereby claim foreign priority benefits under 35 U.S.C. § 1 19(a)-(d) or 365(b) of any foreign apphcation(s) 
for ^tent or inventor's certificate, or 365(a) of any PCT international appUcation which designated at least one 
coufiy other than the United States of America, listed below and have also identified below any foreign appUcation 
for patent or inventor's certificate having a filing date before that of the application on the basis of which priority is 
claiffled: 

No l^ch claim for priority is being made at this time. 

3 I hereby claim the benefit under 35 U.S.C. § 1 19(e) of any United States provisional application(s) Usted 
bel^: 

No such claim for priority is being made at this time. 

I hereby claim the benefit under 35 U.S.C. § 120 or 365(c) of any United States and PCT international 
appUcation(s) listed below and, insofar as the subject matter of each of the claims of this application is not disclosed 
in the prior United States or PCT international appUcation in the manner provided by the first paragraph of 35 U.S.C. 
§ 112, 1 acknowledge the duty to disclose material information as defined in 37 C.F.R. § 1 .56(a) which became 
available between the filing date of the prior appUcation and the national or PCT international filing date of this 
appUcation: 
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§1.56 Duty to disclose information material to patentability. 

(a) A patent by its very nature is affected with a pubhc interest. The public interest is best served, and the most effective patent 
examination occurs when, at the time an application is being examined, the Office is aware of and evaluates the teachings of all information 
material to patentability. Each individual associated with the filing and prosecution of a patent application has a duty of candor and good 
faith in dealing with the Office, which includes a duty to disclose to the Office all information known to that individual to be material to 
patentability as defined in this section. The duty to disclose information exists with respect to each pending claim until the claim is canceled 
or withdrawn from consideration, or the application becomes abandoned. Information material to the patentability of a claim that is 
canceled or withdrawn from consideration need not be submitted if the information is not material to the patentability of any claim 
remainmg under consideration in the application. There is no duty to submit information which is not material to the patentability of any 
existing claim. The duty to disclose all information known to be material to patentability is deemed to be satisfied if all information known 
to be material to patentability of any claim issued in a patent was cited by the Office or submitted to the Office in the manner prescribed by 
§§ T97(b)-(d) and L98. However, no patent will be granted on an appHcation in connection with which fraud on the Office was practiced 
or attempted or the duty of disclosure was violated through bad faith or intentional misconduct. The Office encourages appHcants to 
carefully examine: 

(1) prior art cited in search reports of a foreign patent office m a counterpart application, and 

U (2) the closest information over which individuals associated with the filing or prosecution of a patent application believe any 
Cl pending claim patentably defines, to make sure that any material information contained therein is disclosed to the Office. 

(%l Under this section, information is material to patentability when it is not cumulative to information already of record or being 
madf'Mf record in the application, and 

Tl (1) It establishes, by itself or in combination with other information, a prima facie case of unpatentability of a claim; or 

(2) It refutes, or is inconsistent with, a position the applicant takes in: 

H (i) Opposing an argument of unpatentability rehed on by the Office, or 

H= (ii) Asserting an argument of patentability. 

A pi^a facie case of unpatentability is established when the information compels a conclusion that a claim is unpatentable under the 
prepfpderance of evidence, burden-of-proof standard, giving each term in the claim its broadest reasonable construction consistent with the 
specification, and before any consideration is given to evidence which may be submitted in an attempt to establish a contrary conclusion of 
patentability. 

(c) Individuals associated with the filing or prosecution of a patent application within the meaning of this section are: 

( 1 ) Each inventor named in the application: 

(2) Each attorney or agent who prepares or prosecutes the appHcation; and 

(3) Every other person who is substantively involved in the preparation or prosecution of the appHcation and who is 
associated with the inventor, with the assignee or with anyone to whom there is an obligation to assign the application. 

(d) Individuals other than the attorney, agent or inventor may comply with this section by disclosing information to the attorney, 
agent, or inventor. 



